-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vindexes: Efficient unicode hashing #14395
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
The vendoring also addresses another issue. When Here we vendor, so we essentially set it in stone and keep it always consistent for the future. |
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Co-authored-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]> Signed-off-by: Vicent Marti <[email protected]>
f541c0d
to
aac3a0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much more memory efficient!
Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
Hello! 👋 This Pull Request is now handled by arewefastyet. The current HEAD and future commits will be benchmarked. You can find the performance comparison on the arewefastyet website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general question. Does the output for the current hash functions still remain the same?
Yes, this is 100% backwards compatible and there are golden tests that verify it. Otherwise we'd break all existing Vitess deployments by changing the shard position of existing values! |
Description
The current implementation of Unicode vindexes is not efficient because it depends on the
x/text/collate
package, which allocates a linear amount of memory based on the length of the input string being collated. This can be done much more efficiently because the value we require for a vindex is always a hash, so it can be calculated using a constant amount of memory if instead of generating a weight string and hashing it, we write directly into a hasher.Paired on this with @dbussink.
Benchmarks:
cc @mattlord @aquarapid
Related Issue(s)
Fixes #14398
Checklist
Deployment Notes